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ABSTRACT Cichlid fishes are an excellent model system for studying speciation and the formation of KEYWORDS 
adaptive radiations because of their tremendous species richness and astonishing phenotypic diversity. Midas cichlid 
Most research has focused on African rift lake fishes, although Neotropical cichlid species display much double-digest 
variability as well. Almost one dozen species of the Midas cichlid species complex {Amphilophus spp.) have RADSeq 
been described so far and have formed repeated adaptive radiations in several Nicaraguan crater lakes. synteny 
Here we apply double-digest restriction-site associated DNA sequencing to obtain a high-density linkage segregation 
map of an interspecific cross between the benthic Amphilophus astorquii and the limnetic Amphilophus distortion 
zaliosus, which are sympatric species endemic to Crater Lake Apoyo, Nicaragua. A total of 755 RAD markers RAD markers 
were genotyped in 343 F2 hybrids. The map resolved 25 linkage groups and spans a total distance of 1427 cM mutation rate 
with an average marker spacing distance of 1.95 cM, almost matching the total number of chromosomes 
(n = 24) in these species. Regions of segregation distortion were identified in five linkage groups. Based on 
the pedigree of parents to F2 offspring, we calculated a genome-wide mutation rate of 6.6 x 10~^ mutations 
per nucleotide per generation. This genetic map will facilitate the mapping of ecomorphologically relevant 
adaptive traits in the repeated phenotypes that evolved within the Midas cichlid lineage and, as the first 
linkage map of a Neotropical cichlid, facilitate comparative genomic analyses between African cichlids, 
Neotropical cichlids and other teleost fishes. 



Cichlid fishes are a well-known evolutionary model system for study- 
ing the mechanisms of speciation and the formation of adaptive radi- 
ations. African Rift Lake cichlids in particular display an astonishing 
variability in phenotypes and comprise an enormous number of spe- 
cies, which makes Cichlidae one of the most species-rich families in 
vertebrates (Fryer and lies 1972; Salzburger and Meyer 2004; Berra 
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2007). Because of this mega-diversity, Rift Lake African cichlids have 
been the primary focus of genomic research in this family to date 
(reviewed in Fan et al. 2012), but the >60 million year divergent 
(Matschiner et al. 2011) Neotropical lineage of cichlids is pertinent 
if we are to understand cichlid diversity at the family level and in 
a phylogenetic context. For example, the Neotropical Midas cichlid 
lineage alone features some of the key traits that characterize Afi'ican 
cichlid fish diversity: polymorphisms in color, jaw, Up, tooth, and body 
shape (Meyer 1990a,b; Elmer et al. 2010b). 

Because of their geographic distribution, which includes the two 
Nicaraguan Great Lakes and the independent colonizations of at least 
eight crater lakes of Nicaragua, Midas cichlid communities comprise 
repeatedly evolved populations at different evolutionary stages of local 
adaptation and speciation (Wilson et al. 2000; Barluenga and Meyer 
2010). The two oldest crater lakes house most of the described species 
(9 of 11). Interestingly, some of the younger crater lakes harbor no- 
tably different ecotypes in sympatry that are genetically not differen- 
tiated from each other {e.g., Elmer et al. 2010c). What makes this 
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system so interesting is that similar morphologies, such as elongated 
body shapes or hypertrophied lips, have evolved independently and 
repeatedly in different crater lakes (Elmer et al. 2010b; Manousaki 
et al. 2013). These two key aspects, (1) the crater lakes as "natural 
experiments" that contain replicates of cichlid evolution at various 
stages of diversification, and (2) the ecomorphological variability that 
evolved in different environments in parallel, facilitates the investiga- 
tion of mechanisms and discovery of principles of how and which 
genetic material underlies adaptation and the formation of new spe- 
cies (Ekner and Meyer 2011). For example, knowing the genetics that 
lead to phenotypic changes via quantitative trait loci (QTL) mapping 
and association studies in closely and more distantly related species 
will ultimately help to elucidate how selection has led to the astonish- 
ing biodiversity of African and Neotropical cichlids observed today. 

In at least two instances, Midas cichlid radiations have formed 
repeatedly in Nicaraguan crater lakes; specifically the independent 
formation of several high-bodied endemic species that occupy benthic 
niche and one elongated limnetic species each in two lakes (Vivas and 
McKaye 2001; Barluenga et al. 2006; Elmer et al. 2010b). Apart from 
body shape, limnetic and benthic species also differ in diet, pharyngeal 
jaw morphology, behavior, and coloration (Barlow and Munsey 1976; 
Bayhs 1976a,b; Vivas and McKaye 2001; Barluenga et al. 2006; Ekner 
et al. 2010b; Lehtonen et al. 2011). In Lake Apoyo, which is the oldest 
of all Nicaraguan crater lakes at 22,000 years of age, speciation into 
these two distinct ecotypes has occurred in situ in the form of sym- 
patric species by ecological speciation (Barluenga et al. 2006). Evi- 
dence suggests that benthic-limnetic divergence in XUoa was also in 
sympatry (Kautt et al. 2012). The linkage mapping analysis in this 
study is based on a cross of the sympatricaUy occtirring limnetic 
Amphilophus zaliosus and benthic Amphilophus astorquii from crater 
Lake Apoyo. 

There is a paucity of evidence about the genetic basis of early 
divergence between two lineages as they emerge from a common 
ancestor (reviewed in Coyne and Orr 2004; NosU and Schluter 2011). 
For species that arose through aUopatric speciation without exchang- 
ing genetic material, the random accumulation of genetic incompat- 
ibilities by time might explain their reproductive isolation. However, 
especially in stages of speciation with gene flow, it is of great interest to 
ask (1) whether few or many genes and where they are located, e.g., 
clumped, in the genome; and (2) whether the same type of genes are 
involved in the process of speciation (Ekner and Meyer 2011; Feder 
et al. 2012). In only a few wild vertebrate populations have these 
questions been addressed via gene mapping and QTL mapping stud- 
ies; thus, only very few crucial genes responsible for ecological diver- 
gence (e.g., Colosimo et al. 2005; Chan et al. 2010) and speciation 
(Coyne and Orr 2004; Nosil and Schluter 2011) have been identified 
so far. The development of a Midas cichlid linkage map will aid sub- 
stantially in identifying the genes that underlie phenotypically and 
ecologically important traits in fishes early during their adaptations 
to local conditions and the process of incipient speciation. 

Researchers have long questioned why some cichlid fishes evolve 
in such a diverse and fast manner compared with other vertebrate 
lineages (Meyer 1990a; Meyer 1993; Wittbrodt et al. 1998), or even 
other fishes in the same environment (Elmer et al. in press). Behav- 
ioral and morphological characters that have been proposed to be 
responsible for the elevated speciation rate of cichlids include parental 
care, sexual selection, functional design, and phenotypic plasticity and 
polymorphisms (Meyer 1987; Stiassny and Meyer 1999; Salzburger 
2009). A particular genomic architecture, perhaps derived from 
whole-genome dupUcations, has been proposed as a driver of rapid 
response to selection and higher diversification rates in fishes in 



general and perhaps cichlids in particular (Meyer and Scharfl 1999; 
Kuraku and Meyer 2008). The evolution of a cichlid-specific physical 
order of particular genes along the chromosomes, and certain inter- 
actions between these genes, might have resulted in a genomic archi- 
tecture in cichlid lineages that favored diversification (Salzburger 
2009). As an alternative mechanism hypermutability of the genome 
or specific genomic regions might allow organisms to adapt at a higher 
pace to changing selection pressures. Through a greater neutral mu- 
tation rate, the chance that beneficial mutations might emerge is 
enhanced, therefore allowing neofunctionalization to occur faster, as 
has been proposed for the facilitation of morphological diversity in 
teleosts (Ravi and Venkatesh 2008). 

The mechanisms responsible for the incomparable diversity of 
cichlids are unknown. Only an accumulation and comparison of more 
and more analyzed genomic resources, such as transcriptomes, whole 
genomes, and linkage maps of cichlids and related organisms can shed 
light on these unresolved questions. Some progress has already been 
made in this direction (Albertson et al. 2003; Lee et al. 2005; Sanetra 
et al. 2009; Ekner et al. 2010a; Fan et al. 2012; Cichlid Genome 
Consortium, 2006). The present study seeks to contribute to this 
much-needed resource and analysis with a Neotropical lineage and, 
based on the pedigree of parents to F2 offspring, give an estimate of 
a genome-wide mutation rate. 

For decades, the analysis of genomic data were restricted to 
a few model organisms and organisms of economical importance. 
Next-generation sequencing platforms allow for the cost-effective 
sequencing of whole or partial genomes in a relatively short time; 
this is greatly accelerating genomic research in non-model 
organisms (Stapley et al. 2010; Elmer and Meyer 2011). Apart from 
whole genome sequencing, with restriction site-associated DNA 
sequencing (RADSeq) a powerful new tool to examine the genomes 
of organisms has become available (Baird et al. 2008; reviewed in 
Davey and Blaxter 2010). The advantage of RADSeq is the reduc- 
tion of the genomic information of an organism to a usable and 
comparable fraction of homologous single nucleotide polymor- 
phisms across a set of individuals. By choosing enzymes that differ 
in their cutting frequency, RADSeq allows for the development of 
variable numbers of repeatable genetic markers (RAD markers) to 
be sequenced by lUumina next-generation sequencing technology 
(Baird et al. 2008). RAD markers are polymorphic loci {i.e., 
sequences with at least one single-nucleotide polymorphism [SNP]) 
within or between individuals. Here, we apply a newly developed 
double-digest RADSeq (ddRADSeq) protocol (based on Peterson 
et al. 2012) to create a linkage map of various species of Midas 
cichlids. 

In the present study, we construct a Unkage map with the F2 
progeny of an interspecific cross between the benthic A. astorquii 
and the limnetic A. zaliosus Midas cichlids as a step toward uncover- 
ing the genetic basis of adaptive traits in this species complex. The 
linkage map will improve genomic resources for cichlid fishes in 
general, and Neotropical cichlids in particular, and along with 
whole-genome sequencing, contribute to insights into how cichlids 
achieved their astonishing phenotypic variability. 

MATERIALS AND METHODS 
Pedigree 

A wild-caught male A. zaliosus (2007, Lake Apoyo, Nicaragua) and 
a wild-caught female A. astorquii (2007, Lake Apoyo, Nicaragua) were 
reared separately in the laboratory until sexual maturity and then 
crossed to obtain Fi hybrid progeny. The offspring were raised to sexual 
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maturity in a community tank to allow fuU-sib pairing. When a pair 
formed, they were moved to a separate tank and after spawning natu- 
rally their eggs were harvested and reared in another tank. F2 offspring 
were killed when they reached approximately 2 cm total length to get 
enough tissue for DNA extraction. The Fi pair that had the greatest 
number of F2 offspring was chosen for construction of the linkage map. 

ddRADSeq library preparation and sequencing 

Genomic DNA was extracted from the two parents ("P generation"), 
the Fi hybrid pair and 347 F2 individuals using the QIAGEN DNeasy 
Blood & Tissue Kit following the manufacturer's instructions includ- 
ing RNase A treatment. Then, 1 |ji.g of DNA template from each 
individual was double-digested using the restriction enzymes Pstl- 
HF (20 units/reaction) and Mspl (20 units/reaction) in one combined 
reaction for 3 hr at 37°. Subsequently, each fragmented sample was 
purified using the QIAGEN MinElute Reaction Cleanup Kit, eluted in 
20 |xL of Elution Buffer (EB), and quantified. Fragments were then 
ligated to PI adapters that bind to Pstl-HF created restriction sites and 
P2 adapters binding to overhangs generated by Mspl (see Supporting 
Information, Table SI for adapter sequences). In each reaction, -0.4 (xg 
of DNA, 1 jjuL of PI adapter (10 [jlM), 1 |xL of P2 adapter (10 |xM), 
1 jjlL of T4 ligase (1,000 U/|jlL), 4 |jlL of lOx T4 ligation buffer, and 
double-distiUed water were combined to a total volume of 40 [lL. The 
ligation was processed on a polymerase chain reaction (PGR) machine 
using the following conditions: 37° for 30 min, 65° for 10 min, followed 
by a decrease in temperature by 1.3° per minute until reaching 20°. 

After we ligated each individual's DNA to the adapters, samples 
were pooled and size-selected from an agarose gel. A preliminary 
lUumina sequencing run containing the parental and Fj DNA revealed 
that the pilot analysis gel size range of 250—500 bp resulted in se- 
quencing too many fragments and therefore a range of -330—400 bp 
was chosen for subsequent libraries. Manual size selection from aga- 
rose gel electrophoresis was performed on the library containing the 
parental and Fi samples and samples were then cleaned using the 
QIAGEN MinElute Gel Purification Kit and eluted in 10 [jlL of EB. 
Fifty F2 individuals were pooled per library and size selected using 
Pippin Prep technology (Sage Science, Beverly, MA). Seven PCRs 
(several are necessary to minimize PGR bias) were performed in rep- 
licate on size-selected fragments [(Peterson et al. 2012) Figure 1]. Each 
PGR contained 10-20 ng of library DNA template, 4 |ji,L of dNTPs 
(100 mM), 4.0 jjiL of 5x Phusion HE buffer (NEB), 0.2 |xL of Phusion 
Taq polymerase (NEB), 1.0 (jlL of each RAD primer (10 [jlM), and 
filled up to 20 jjuL with double-distilled water. PGR conditions were 
performed as follows 98° (30 sec), [98° (10 sec), 65° (30 sec), 72° 
(30 sec)] X 10, 72° (300 sec). Gel electrophoresis was performed on 
pooled products to remove remaining oligo dimers and other con- 
taminants. Fragments ranging from 330 to 400 bp were excised from 
a gel after electrophoresis and cleaned using the QIAGEN MinElute 
Gel Extraction Kit and eluted in 10 |xL of EB. The eight libraries (one 
containing the parental and Fi DNA and the remainder seven each 
containing 50 F2 individuals) were diluted to 1 nM and sequenced 
single end to 150 bp length with the lUumina TruSeq SBS Kit v5 in 
eight lanes on an lUumina Genome Analyzer IIx (GeGKo: Genomics 
Genter University of Konstanz) over two consecutive runs, each con- 
taining a single PhDC control lane. 

Linkage mapping analysis 

Bioinformatic processing of the reads {i.e., RADSeq tags) was per- 
formed using Stacks (Catchen et al. 2011). Reads were trimmed to 
a length of 110 bp, cleaned from erroneous and low-quality reads, and 



sorted by individual. Processing of the 347 F2-offspring individuals 
was performed using the denvo_map.pl script, which creates stacks of 
loci (i.e., RADSeq stacks) within individuals, followed by building 
a catalog of loci present in the two parents and then mapping all 
stacks from the offspring to the parental catalog and finally creating 
the offspring genotypes for those loci. Multiple SNPs in one RADSeq 
stack were treated as a single polymorphic marker (i.e., one RAD 
marker). The parameters were optimized and set as follows: 
the minimum amount of identical reads to create a stack in each 
parent and progeny was set to three ( — m 3, — P 3) the number of 
mismatches allowed between loci from the parents was set to three 
( — n 3), and highly repetitive RAD t^s were removed or disjointed 
( — t), other parameters were set to default. 

JoinMap 4 (Van Ooijen 2006) was used to perform linkage-map- 
ping analysis. Genotypes were exported from Stacks and were filtered 
to only include those markers for which there were genotype calls in at 
least 90 F2 individuals with a coverage of at least 5x. Linkage analyses 
were performed using all markers (of the type aa/bb, aa/ab, ab/aa, ab/ 
ab, ab/ac, ab/cc in the P generation as defined by Stacks) containing 
alleles from the Fj pair that could be matched with confidence to each 
parental unit and did not deviate from Mendelian segregation ratios. 
Markers under segregation distortion (P = 0.05 — 0.001) were added 
step by step from the less significant to the most and their effect on the 
map was checked. Marker groupings were calculated using an inde- 
pendent logarithm of odds (LQD) score of 6.0, but the nimiber of 
calculated linkage groups did not change between LOD scores of 6.0 
to a more stringent LOD score of 10.0. Each linkage group was cal- 
culated using the regression mapping algorithm and Kosambi's map- 
ping function (Kosambi 1943) with the following thresholds: 
recombination frequency of 0.400, LQD value of 1.0, and a good- 
ness-of-fit jump of 5.00. Marker order was optimized by performing 
a ripple function after each added locus. 

Comparative analysis 

AH RAD markers used in the linkage map were indexed with the 
respective linkage group and position and compared with the genomes 
of the fishes tilapia {Oreochromis niloticus; version from 25.01.2012), 
stickleback {Gasterosteus aculeatus; version BRQADS1.61), and medaka 
{Oryzias latipes; version MEDAKA1.48) using blastn searches and an 
e-value cut-off of 10~^^ for tilapia (a basal cichlid) and 10"'" for 
stickleback and medaka. Tilapia was used as a representative of the 
African cichlid lineage, whereas stickleback and medaka were used 
as phylogenetically close teleost outgroups to the cichlids. If a marker 
mapped to more than one locus it was either excluded or, if the 
difference in the bit score between best hit and second-best hit was 
more than 20, the best hit was retained. Markers that mapped to 
unlinked scaffolds were excluded from the synteny analyses. Synteny 
was calculated in percentage of markers mapping to a single orthol- 
ogous linkage group in contrast to those mapping to different link- 
age groups. Markers that solely map to a single linkage group, i.e., if no 
other markers mapped to that same linkage group, were excluded. 

Mutation rate estimation 

De novo mutations are expected to be recognized as high- confidence 
SNPs present in one or a few offspring but not in the parents. Loci 
from two Fi individuals and eight randomly chosen F2 individuals 
with high coverage were compared to the parental (P) loci and 
checked for SNPs only present in the progeny. The coverage for a locus 
to be included in the analysis was set to at least 8x (minimally 4x per 
allele to reduce the chance of candidate new mutations being sequencing 
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Isolation and cleaning 
of fragments 



PCR amplification of fragments ligated 
to PI and P2 adapters (RADSeq tags) 



locus 1 



AACCTTGCAGCGATCGGGCTTAnAGCTTAAACGAACTAGCT 
AACCTTGCAGCGATCGGGCTTATTAGCTTAAACGAACTAGCT 
AACCTTGCAGCGATCGGGCTTATTAGCTTAAACGAACTAGCT 
AACCTTGCAGCGATCGGGCnATTAGCTTAAACGAACTAGCT 
ATGGGTGCAGCGATCGGGCTTATTAGCTTAAACGAACTAGCT 
ATG&GTGCAGCGATCGGGCTTAnAGCTTAAACGAACTAGa 
ATGGGTGCAGCGATCGGGCTTATTAGCTTAAACGAACTAGCT 

locus 2 , . r , ... 

ATGGGTGCAGTGGCGATGATCGTAGCAACGAItaiCATCCCC Selection of polymorphic 

ATGGGTGCAGTGGCGATGATCGTAGCAACGATdAEcATcccc markers between individuals 

ATGGGTGCAGTGGCGATGATCGTAGCAACGATGAbcATCCCC jOCUS 2 

ATGGGTGCAGTGGCGATGATCGTAGCAACGATQ/^CATCCCC ^ 

AACaTGCAGTGGCGATGATCGTAGCAACGATOGbcATCCCC — "^$2S?5J?5J?™"S™?"JS"3$^"S2 

«irrTTrr«rTrrrr«TraTrrT«rrnrr.Tr^Ur«Trrrr AACCTTGCAGTGGCGATGATCGTAGCAACGAT(^GpCATCCCC 
AACCTTGCAGTGGCGATGATCGTAGCAACGATGGpCATCCCC 

AACCTTGCAGTGGCGATGATCGTAGCAACGATqGfiCATCCCC 



Figure 1 Schematic of the ddRADSeq protocol. Arrows 
indicate restriction sites for a rare-cutting enzyme in 
green and a frequent-cutting enzyme in orange. PI 
adapters specifically bind to the rare-cutting enzyme 
(green) and P2 adapters bind to the frequent-cutter 
(orange) restriction sites. Each individual contains a dif- 
ferent barcode of five nucleotide bases specified by the 
PI adapter. Sequenced RAD tags start with the barcode 
(individual one: red barcode, individual two: blue 
barcode). The bioinformatic software Stacks allows for 
the detection of single nucleotide polymorphisms 
between individuals (outlined by red box in sequence 
alignment). 



errors). In addition, an empirical distribution of allele-to-allele coverage 
ratio was estimated from all heterozygous RAD markers in the F2 
generation. New mutation candidates in the progeny were checked 
against this distribution, and only those mutations that had an allele 
ratio within the 95% confidence interval of the loci present in the 
empirically estimated distribution were retained. This step yielded loci 
with a ratio lower than 0.5 or greater than 2.0 to be discarded and was 
performed to exclude candidate loci with odd allele-to-allele ratios {e.g., 
repetitive sequences confoimded with alleles). For example, a locus with 
a mutation candidate and 10 reads in total, including seven reads of the 
original allele and three reads of the allele with the new mutation, would 
be excluded because the aUele-to-aUele ratio is lower than 0.5. 

RESULTS 

SNP and marker detection 

A total of 254 million reads were used, with an average of 31.8 million 
reads per library (SD: 3.1 million reads). On average, approximately 
588,000 reads were obtained per individual (SD: 201,454 reads) with 
15x coverage per read (SD: 5.1x). Within the parental A. astorquii 
individual, a SNP frequency of 0.80 SNPs/kb (11,172 SNPs in 139,109 
loci) and for A. zaliosus 0.89 SNPs/kb (12,764 SNPs in 142,740 loci) 
were detected (Table S2). Of 118,213 loci that were scored in both 
parents, 12,429 were polymorphic with one to three SNPs, with a total 
of 15,035 SNPs and a frequency of 1.2 SNPs/kb. Of those loci that 
were polymorphic, 18.5% were fixed between parents. A total of 
38,203 loci were detected in the parents and were present in at least 
90 of the 343 F2 individuals. The number of loci shared between 
parents and F2 progeny was smaller than that between parents and 
Fi because the F2 progeny gel size range was narrower and therefore 
included fewer fragments. A total of 3,184 loci that were shared be- 
tween parents, Fj and F2S were polymorphic and retained for further 



analyses. Of these loci, 755 were informative (alleles that segregated in 
a 1:2:1 segregation ratio in the F2 generation) and could serve for the 
construction of the linkage map (Figure 2). Because different libraries 
did not exactly contain the same loci and some of the loci did not pass 
the coverage threshold, we obtained -25% missing data. Almost 50% 
of the loci were present in more than 300 individuals (Figure SI). 

Linkage groups 

Of the 347 F2 individuals genotyped with ddRADSeq, four were ex- 
cluded because of low coverage (<6x per individual; see Table S3 for 
genotype data). From linkage mapping analysis, 25 linkage groups 
were identified with a total size of 1427 cM and an average marker 
spacing distance of 1.95 cM (Figure 2, Table S4). Linkage groups were 
between 51.5 and 77.1 cM in length except for the two small linkage 
groups LG24 and LG25 with 2.5 and 2.0 cM, respectively. With the 
exception of two small linkage groups with only two and three linked 
markers, all linkage groups consisted of at least 10 markers (mini- 
mimi: 10 markers, maximimi: 54 markers). On average, a typical 
linkage group comprised 30 markers (Table S4). A total of 17 markers 
(2.3% of all markers) were completely linked (no recombination was 
observed between them). The maximum distance between two 
markers on a single linkage group was 30.7 cM on LG20. 

Segregation distortion 

Almost 10% of all markers (74 of 755 markers) deviated from 
Mendelian segregation ratios with a P- value between 0.05 and 0.001 
(Table S3). Markers that were distorted significantly with P < 0.0001 
were not considered because ratios that are too skewed might create 
inaccurate distances and false linkages (Cervera et al. 2001). Closely 
linked markers with similar patterns of segregation distortion proba- 
bly can be attributed to biological causes, rather than genotyping 
error. Of these 74 markers, in fact 19 could be found on LGl (39% 
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63.0 62.9 61.9 



61.1 60.4 59.8 59.7 59.4 



LG17 LG18 LG19 LG20 LG21 LG22 LG23 LG24 
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LG25 



2.0 



58.4 58.0 57.1 



54.6 54.4 53.7 



51.5 



Figure 2 Linkage groups constructed with 755 RAD markers and 
ordered by size. Blocks of segregation distortion are indicated by 
brackets. IVIarkers used for synteny analyses are shown in orange. For 
further information on marker IDs, position, segregation distortion and 
markers used in synteny analyses see, Table S3. 



distorted markers on LGl) with most of them directly strung together. 
In all these markers the ratio was skewed toward a higher frequency of 
the paternal allele. Apart from the large segregation distortion block 
on LGl, four other smaller blocks of segregation distortion were iden- 
tified: five adjacent markers on LG6, three markers in close proximity 
at the distal part of LG15, seven markers on LG23 spanning a distance 
of -15 cM and all three markers on LG24. LG6 and LG24 distorted 
markers exhibit a higher ratio of the maternally inherited allele, 
whereas distorted markers both on LG7 and LG23 display an excess 
of heterozygotes. Thus, half of all distorted markers (37 of 74) are 
placed on these five blocks (Figure 2, Table S3). 

Comparative genomics 

The vast majority of markers present in any given particular Midas 
cichlid linkage group also map to a single orthologous linkage group 
in tilapia (92 of 107 markers that uniquely mapped to the tUapia 



genome, excluding those markers that mapped to a single linkage 
group with only this one syntenic marker), a basal African cichlid that 
last shared a common ancestor at least 60—90 million years ago 
(Salzburger and Meyer 2004). Four linkage groups in tilapia (LGs 1, 
4, 10, and 21) and six linkage groups in Midas (LGs 8, 12, 16, 23, 24, 
and 25) are not represented by more than two markers that map to 
any orthologous linkage group. Midas cichlid linkage group 23 map- 
ped to 14 loci of different scaffolds in the tilapia genome. For the other 
five Midas linkage groups that did not map to orthologous linkage 
groups in tilapia, this is caused by an insufficient number of markers 
used to anchor the tilapia genome (Table S5) or small linkage group 
sizes (due to insufficient data in tilapia genomic resources) in tilapia. 
In addition to the linkage groups that share homologous markers 
(conserved synteny), the respective position of these markers (linkage 
order) is mostly the same, suggesting retained syntenic relationships 
among those loci (Figure 3). However, divergent marker orders, pre- 
simiably caused by intrachromosomal rearrangements, can be detected 
to some extent between the Midas cichlid and tilapia {e.g., LGs 13, 15, 
and 21 in the Midas cichlid). In general, however, it is clear that overall 
levels of synteny between these two cichlid lineages are quite highly 
conserved, with 86% of aU markers mapping to a single orthologous 
linkage group when those linkage groups containing two or more 
markers are considered. 

As expected, fewer homologous markers were found for stickleback 
(44 markers) and medaka (38 markers) compared with tilapia and the 
Midas cichlid (Figure 3). The stickleback shows many markers of con- 
served synteny (78%) with the Midas cichlid. In contrast, fewer markers 
show conserved synteny (38%) between medaka and Midas cichlids. 

IVIutatlon rate 

Two new mutations in the Fi progeny and five new mutations in the 
F2 progeny were found. The same mutation was observed in two 
different F2 individuals and was therefore only counted as one muta- 
tion. The Fi mutation rate was calculated to 7.4 x 10~* and the F2 5.8 
X 10~^, leading to an average mutation rate of 6.6 x 10~^ mutations 
per nucleotide per generation. 

DISCUSSION 

Here, we report the first linkage map of a Neotropical cichlid species. 
It wiU serve as important addition to the genomic resources for cichlid 
fishes and facilitate in searching for genes that contribute to ecological 
and phenotypic diversity in Midas cichlids in particular and cichlids 
more generally. With a total of 25 linkage groups, the present linkage 
map closely matches the number of chromosomes (n = 24) in the 
Midas cichlid (Feldberg et al. 2003). The small groups LG24 and LG25 
might represent larger hnkage groups or collapse with another linkage 
group when further markers can be incorporated. Alternatively, in the 
case of LG24, markers in greater distance might not have been iden- 
tified because of highly distorted segregation ratios rather than miss- 
ing genotypes. Significant parts of a linkage group might be 
completely missing if it contains distorted regions (Cervera et al. 
2001; Douclefif et al. 2004). Alternatively, the distorted markers con- 
stitute false linkages (e.g.. Nelson et al 2010), though at least in the 
case of LG24 this seems improbable given that all three markers are in 
very close proximity and a small number of recombination events 
have occurred between them. 

Several genes associated with phenotypic divergence between cichlid 
species have been identified to date (Kijimoto et al 2005; Kobayashi 
et al 2006; Albertson and Kocher 2006; Salzburger et al 2007; Roberts 
et al 2009; Fan et al. 2011; Roberts et al. 2011). In African cichlids. 
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Figure 3 Oxford grid showing 
relationship between Midas 
cichlid and other teleost fishes. 
Each linkage group (in the case 
of medaka each chromosome) 
is size-normalized to equal one 
and marker positions were 
placed relative to their respective 
linkage group (or chromosome) 
size. Oxford grids showing marker 
relationships between (A) tilapia 
and Midas cichlid, (B) stickleback 
and Midas cichlid, and (C) me- 
daka and Midas cichlid. Color 
refers to markers found on a 
Midas linkage group. 



a specific linkage group (LG5) has been proposed to constitute a 
"speciation chromosome" containing genes responsible for sex deter- 
mination, tooth shape, visual sensitivity, and color pattern (Streeknan 
and Albertson 2006). This linkage group is orthologous to the Midas 
LG4. It wiU be interesting to test whether these genes and this particular 
linkage group are also important for traits under selection in Neotrop- 
ical cichUds. 

As the number of genetic markers that can be quickly generated 
has exploded with new sequencing technology, the limiting factor for 
constructing a high-resolution linkage map is the biological resources; 
the number of individuals and recombination events. Thousands of 
markers can be produced in a short time with ddRADSeq but high- 
resolution mapping can only be achieved by maximizing the nimiber 
of recombinations, i.e., the individuals. Under laboratory conditions, 
Midas cichlids can spawn naturally almost once per month with more 
than a 300 offspring per clutch, which allows for harvesting a high 
number of individuals. Because of the Midas cichlids' amenable life 
history, we were able to create a genetic map based on more than 340 
F2 individuals from one Fi family. The number of markers that can be 
generated and the amount of missing data that is accepted constitute 
a trade-off. When minimizing missing data, markers that are not 
present in all individuals {e.g., through library effects or coverage 
thresholds) will be lost. Accordingly, while maximizing the nimiber 



of loci, more missing data are accepted, which might result in in- 
correct distances in linkage map construction and in QTL analyses 
(Hackett and Broadfoot 2003). 

To perform synteny analyses and achieve a high marker density, 
we allowed more missing data for the purpose of this study. 
Compared with the five other linkage maps that have been 
constructed with RADSeq technology so far (Amores et al. 2011; 
Baxter et al. 2011; Chutimanitsakun et al. 2011; Pfender et al. 2011; 
Parnell et al. 2012), the map of the present study has the second 
highest number of progeny genotyped for markers and hence a high 
nimiber of recombinations. The gar linkage map consisted of more 
markers (-5500), although it was based on fewer individuals (Amores 
et al. 2011). The average spanning distance between two markers is 
1.95 cM in our study, which is the lowest value published to date for 
a RADSeq study. In comparison with the published cichlid linkage 
maps from A. burtoni (Sanetra et al. 2009), tilapia (Lee et al. 2005), 
and Lake Malawi cichlids (Albertson et al. 2003; Parnell et al. 2012), 
the present Midas cichlid linkage map has the second highest number 
of individuals, the highest number of markers, and lowest average 
distance between two markers. This result highlights the promise of 
next-generation sequencing based genotyping, and especially the 
ddRADSeq protocol, for hnkage mapping experiments when biolog- 
ical material is sufficient. 
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Although A. astorquii and A. zaliosus have clearly distinct ecolo- 
gies, their divergence time is very recent [should be less than 10,000 
years (Barluenga et al. 2006) and maximally 20,000 years (Kutterolf 
et al. 2007)]. For this reason, only -^10.5% of our markers were poly- 
morphic and of those 18.5% were fixed between species. Given this 
young divergence, it is of great interest how these species developed 
and maintain reproductive isolation. Besides premating isolation, 
chromosomal rearrangements or genetic incompatibilities might act 
as isolating barriers to gene flow in postzygotic reproduction 
(reviewed in Coyne and Orr 2004). As the species are young and 
can be crossed easily, incompatible loci seems more probable than 
large genomic rearrangements (Moyle and Graham 2005). Genetic 
incompatibilities between species detected on the basis of segregation 
distortion have been suggested in many cases, including lake whitefish 
(Rogers and Bernatchez 2007; Rogers et al. 2007), Drosophila (Phadnis 
and Orr 2009), Nasonia (Gadau et al. 1999; Niehuis et al. 2008), and 
some plant species (Foolad et al. 1995; Whitkus 1998; Fishman et al. 
2001; Myburg et al. 2004). 

In our linkage mapping analysis, we found that 10% of the markers 
deviated (P < 0.05) from expected Mendelian segregation ratio. Single 
markers deviating from the expected ratio might result from genotyp- 
ing errors, though blocks of deviating linked markers as on LGsl, 6, 
15, 23, and 24 probably have a biological basis and might be caused by 
segregation distortion genes {e.g., Lyttle 1993) or genetic incompati- 
bilities by differential fixation of alleles between parents (Dobzhansky 
1937; Muller 1942; Fishman and Willis 2001). If genomic incompat- 
ibilities between species exist, chromosomal regions should consis- 
tently exhibit segregation distortion. As our data only originates from 
a single family, fiirther testing in laboratory bred or wild A. astorquii 
and A. zaliosus hybrids will be necessary (Danzmann and Gharbi 2001) 
to reveal whether the five linkage groups exhibiting blocks of distorted 
markers result from interspecific genetic incompatibilities. 

Genomic architecture has been suspected of being a main factor 
contributing to the cichlids' evolutionary success in terms of biodiver- 
sity (Salzburger 2009). The comparative genomic analysis between the 
Midas cichlid and the tUapia reveals a highly conserved synteny with 
86% of the markers mapping to a single orthologous linkage group. 
This finding points toward a high level of conserved synteny between 
Neotropical and African cichlids, in spite of their relatively ancient 
split (60 mya according to Vences et al. 2001; Matschiner et al. 2011; 
but see Farias et al. 1999; Salzburger and Meyer 2004; Azuma et al. 
2008). A relatively high level of conservatism in number of chromo- 
somes across cichlid genera and species is observed (see Feldberg et al. 
2003), indicating that interchromosomal rearrangements are rare. 
This observation and our results suggest that at least large-scale chro- 
mosomal rearrangements are probably not a main factor contributing 
to the species richness in cichlids and also agree with previous studies 
demonstrating a high degree of conserved synteny between cichlid 
genomes (Sanetra et al. 2009). As a comparison, the human and lemur 
lineages diverged -74 mya {e.g., Huchon et al. 2007), although their 
number of chromosomes differs considerably {Homo sapiens: 23 chro- 
mosomes; Microcebus murinus: 33 chromosomes), including several 
interchromosomal rearrangements that occurred after their split 
(Dutrillaux 1979). 

Medaka and stickleback show less conserved synteny with the 
Midas cichhd (medaka: 38%, stickleback: 78% markers with conserved 
synteny). Although the data for genomic comparisons between the 
Midas cichlid and the noncichlid fish species stickleback and medaka 
are limited because of the large phylogenetic distance of more than 
100 million years since these hneages last shared a common ancestor 
(Matschiner et al. 2011), it allows for two conclusions: (1) the degree 



of conserved synteny with these species is lower than compared with 
tilapia and (2) medaka shares considerably less conserved synteny 
with the Midas cichlid compared with stickleback. As chromosomal 
rearrangements leading to linkage and synteny disruptions accumu- 
late with time, stickleback (divergence time -140 mya, Matschiner 
et al. 2011) and medaka (divergence time -100 mya, Matschiner 
et al. 2011) display less conserved synteny to the Midas cichlid, as 
expected, compared with the younger split of Afi-ican vs. Neotropical 
cichlids. However, synteny disruption seems to be greater between 
medaka and the Midas cichlid compared with stickleback, which 
was unexpected because most phytogenies place the stickleback basal 
to cichlids and medaka (Chen et al. 2003; Mabuchi et al. 2007; Azuma 
et al. 2008, Matschiner et al. 2011). Incorrect homology determination 
or mapping errors might account for a lower degree of conserved 
synteny between the Midas cichlid and medaka. Alternatively, the 
medaka lineage has undergone considerable more rearrangements 
compared with cichlids and stickleback, and/or phylogenetic relation- 
ships are incorrect, as relationships between these teleosts remain 
controversial (Chen et al. 2003; Steinke et al. 2006). 

The mutation rate (u) estimation of 6.6 x 10~^ per site per gen- 
eration in Midas cichlids indicates a somewhat elevated mutation rate 
compared to estimates from other vertebrates [mouse: 3.8 x 10~^, 
human: 3.0 x 10~^ (Lynch 2010)]. A mutation rate based on the 
synonymous substitution rate and UTRs from pooled Midas cichlid 
expressed sequence tags yielded an estimate (-1.25 x 10~* per site per 
year) higher than that of the present study, although it was thought to 
be elevated by population-level polymorphism (Elmer et al. 2010a). 
For mitochondrial DNA control region, the substitution rate has been 
estimated as 7.1 x 10~^ per site per year (Barluenga and Meyer 2004), 
which, despite the fact that control region should have a high muta- 
tion rate, is fairly well in agreement with our current genome-wide 
mutation rate estimate. However, all calculations of mutation or sub- 
stitution in recendy diverged populations differ because of the role of 
lineage sorting in decreasing rates with time {i.e., pedigree vs. popu- 
lation vs. phylogenetic rates) [see e.g., (Ho et al. 2005; Subramanian 
et al. 2009)]. Although vertebrate estimates only include model species 
such as human, mouse, and rat, this finding is congruent with in- 
terspecific comparisons of mutation rates in teleost compared with 
mammalian lineages that revealed that teleosts exhibit a higher neutral 
mutation rate (Ravi and Venkatesh 2008). It might be argued that an 
elevated mutation rate might explain the teleost morphological diver- 
sity and species richness, since the chance that a beneficial mutation 
arises on which selection can act is increased. More estimates of 
vertebrate mutation rates are needed in order to ascertain whether 
the cichlid lineage, and species-rich lineages in general, exhibit higher 
mutation rates. 

Cichlid fishes represent wonderfiil examples for phenotypic diversity 
and convergence, rapid speciation, and adaptive radiation (Fan et al 
2012). Yet, we have stiU to understand how the underlying genetics of 
this diversity operate to translate into the ubiquitous phenotypic richness 
of this evolutionary lineage. The present linkage map adds valuable 
genomic information to tackle these fundamental questions in evolu- 
tionary biology. Future genome-wide analyses on the level of popula- 
tions, species, and among distantly related lineages in cichlid fishes will 
profit from this resource and it wiU help to shed Ught on the genomics of 
adaptation in this extraordinarily diverse evolutionary lineage. 
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